Dataset statistics
| Number of variables | 4 |
|---|---|
| Number of observations | 12 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 480.0 B |
| Average record size in memory | 40.0 B |
Variable types
| Numeric | 2 |
|---|---|
| Categorical | 2 |
Period has constant value "2001-2019" | Constant |
Lake Victoria is highly correlated with Simiyu | High correlation |
Simiyu is highly correlated with Lake Victoria | High correlation |
Lake Victoria is highly correlated with Simiyu | High correlation |
Simiyu is highly correlated with Lake Victoria | High correlation |
Lake Victoria is highly correlated with Simiyu | High correlation |
Simiyu is highly correlated with Lake Victoria | High correlation |
Simiyu is highly correlated with Month | High correlation |
Month is highly correlated with Simiyu and 1 other fields | High correlation |
Lake Victoria is highly correlated with Month | High correlation |
Month is highly correlated with Period | High correlation |
Period is highly correlated with Month | High correlation |
Month is uniformly distributed | Uniform |
Lake Victoria has unique values | Unique |
Simiyu has unique values | Unique |
Month has unique values | Unique |
Reproduction
| Analysis started | 2022-05-05 15:11:44.021659 |
|---|---|
| Analysis finished | 2022-05-05 15:11:51.568778 |
| Duration | 7.55 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
Lake Victoria
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONUNIQUE| Distinct | 12 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.524877193 |
| Minimum | 1.764421053 |
|---|---|
| Maximum | 9.362789474 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 192.0 B |
Quantile statistics
| Minimum | 1.764421053 |
|---|---|
| 5-th percentile | 2.340936842 |
| Q1 | 3.366657895 |
| median | 4.0735 |
| Q3 | 5.168460526 |
| 95-th percentile | 8.065744737 |
| Maximum | 9.362789474 |
| Range | 7.598368421 |
| Interquartile range (IQR) | 1.801802632 |
Descriptive statistics
| Standard deviation | 2.037278089 |
|---|---|
| Coefficient of variation (CV) | 0.4502394214 |
| Kurtosis | 1.993378036 |
| Mean | 4.524877193 |
| Median Absolute Deviation (MAD) | 0.971236842 |
| Skewness | 1.264509715 |
| Sum | 54.29852632 |
| Variance | 4.150502013 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 9.362789474 | 1 | |
| 5.318421053 | 1 | |
| 5.118473684 | 1 | |
| 4.168105263 | 1 | |
| 1.764421053 | 1 | |
| 4.687052632 | 1 | |
| 2.812631579 | 1 | |
| 3.477 | 1 | |
| 3.176 | 1 | |
| 3.978894737 | 1 | |
| Other values (2) | 2 |
| Value | Count | Frequency (%) |
| 1.764421053 | 1 | |
| 2.812631579 | 1 | |
| 3.176 | 1 | |
| 3.430210526 | 1 | |
| 3.477 | 1 | |
| 3.978894737 | 1 | |
| 4.168105263 | 1 | |
| 4.687052632 | 1 | |
| 5.118473684 | 1 | |
| 5.318421053 | 1 |
| Value | Count | Frequency (%) |
| 9.362789474 | 1 | |
| 7.004526316 | 1 | |
| 5.318421053 | 1 | |
| 5.118473684 | 1 | |
| 4.687052632 | 1 | |
| 4.168105263 | 1 | |
| 3.978894737 | 1 | |
| 3.477 | 1 | |
| 3.430210526 | 1 | |
| 3.176 | 1 |
| Distinct | 12 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.394868421 |
| Minimum | 0.1952105263 |
|---|---|
| Maximum | 4.753578947 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 192.0 B |
Quantile statistics
| Minimum | 0.1952105263 |
|---|---|
| 5-th percentile | 0.2713421052 |
| Q1 | 1.166118421 |
| median | 2.681605263 |
| Q3 | 3.291078948 |
| 95-th percentile | 4.381721052 |
| Maximum | 4.753578947 |
| Range | 4.558368421 |
| Interquartile range (IQR) | 2.124960527 |
Descriptive statistics
| Standard deviation | 1.489299991 |
|---|---|
| Coefficient of variation (CV) | 0.6218713219 |
| Kurtosis | -1.109718615 |
| Mean | 2.394868421 |
| Median Absolute Deviation (MAD) | 1.302157895 |
| Skewness | -0.06066814636 |
| Sum | 28.73842105 |
| Variance | 2.218014463 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 1.046947368 | 1 | |
| 3.091421053 | 1 | |
| 0.1952105263 | 1 | |
| 1.8 | 1 | |
| 2.908473684 | 1 | |
| 4.077473684 | 1 | |
| 2.981052632 | 1 | |
| 3.890052632 | 1 | |
| 0.3336315789 | 1 | |
| 1.205842105 | 1 | |
| Other values (2) | 2 |
| Value | Count | Frequency (%) |
| 0.1952105263 | 1 | |
| 0.3336315789 | 1 | |
| 1.046947368 | 1 | |
| 1.205842105 | 1 | |
| 1.8 | 1 | |
| 2.454736842 | 1 | |
| 2.908473684 | 1 | |
| 2.981052632 | 1 | |
| 3.091421053 | 1 | |
| 3.890052632 | 1 |
| Value | Count | Frequency (%) |
| 4.753578947 | 1 | |
| 4.077473684 | 1 | |
| 3.890052632 | 1 | |
| 3.091421053 | 1 | |
| 2.981052632 | 1 | |
| 2.908473684 | 1 | |
| 2.454736842 | 1 | |
| 1.8 | 1 | |
| 1.205842105 | 1 | |
| 1.046947368 | 1 |
| Distinct | 12 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 192.0 B |
| Jan | |
|---|---|
| Nov | |
| Apr | |
| Sep | |
| Mar | |
| Other values (7) |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Characters and Unicode
| Total characters | 36 |
|---|---|
| Distinct characters | 22 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 12 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | Jan |
|---|---|
| 2nd row | Feb |
| 3rd row | Mar |
| 4th row | Apr |
| 5th row | May |
Common Values
| Value | Count | Frequency (%) |
| Jan | 1 | |
| Nov | 1 | |
| Apr | 1 | |
| Sep | 1 | |
| Mar | 1 | |
| Jun | 1 | |
| Dec | 1 | |
| Jul | 1 | |
| May | 1 | |
| Oct | 1 | |
| Other values (2) | 2 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| jun | 1 | |
| dec | 1 | |
| may | 1 | |
| jul | 1 | |
| apr | 1 | |
| sep | 1 | |
| feb | 1 | |
| jan | 1 | |
| nov | 1 | |
| oct | 1 | |
| Other values (2) | 2 |
Most occurring characters
| Value | Count | Frequency (%) |
| J | 3 | 8.3% |
| a | 3 | 8.3% |
| e | 3 | 8.3% |
| u | 3 | 8.3% |
| n | 2 | 5.6% |
| M | 2 | 5.6% |
| r | 2 | 5.6% |
| A | 2 | 5.6% |
| p | 2 | 5.6% |
| c | 2 | 5.6% |
| Other values (12) | 12 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 24 | |
| Uppercase Letter | 12 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 3 | |
| e | 3 | |
| u | 3 | |
| n | 2 | |
| r | 2 | |
| p | 2 | |
| c | 2 | |
| b | 1 | 4.2% |
| y | 1 | 4.2% |
| l | 1 | 4.2% |
| Other values (4) | 4 |
Uppercase Letter
| Value | Count | Frequency (%) |
| J | 3 | |
| M | 2 | |
| A | 2 | |
| F | 1 | 8.3% |
| S | 1 | 8.3% |
| O | 1 | 8.3% |
| N | 1 | 8.3% |
| D | 1 | 8.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 36 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| J | 3 | 8.3% |
| a | 3 | 8.3% |
| e | 3 | 8.3% |
| u | 3 | 8.3% |
| n | 2 | 5.6% |
| M | 2 | 5.6% |
| r | 2 | 5.6% |
| A | 2 | 5.6% |
| p | 2 | 5.6% |
| c | 2 | 5.6% |
| Other values (12) | 12 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 36 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| J | 3 | 8.3% |
| a | 3 | 8.3% |
| e | 3 | 8.3% |
| u | 3 | 8.3% |
| n | 2 | 5.6% |
| M | 2 | 5.6% |
| r | 2 | 5.6% |
| A | 2 | 5.6% |
| p | 2 | 5.6% |
| c | 2 | 5.6% |
| Other values (12) | 12 |
| Distinct | 1 |
|---|---|
| Distinct (%) | 8.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 192.0 B |
| 2001-2019 |
|---|
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 9 |
| Min length | 9 |
Characters and Unicode
| Total characters | 108 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 2001-2019 |
|---|---|
| 2nd row | 2001-2019 |
| 3rd row | 2001-2019 |
| 4th row | 2001-2019 |
| 5th row | 2001-2019 |
Common Values
| Value | Count | Frequency (%) |
| 2001-2019 | 12 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 2001-2019 | 12 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 36 | |
| 2 | 24 | |
| 1 | 24 | |
| - | 12 | 11.1% |
| 9 | 12 | 11.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 96 | |
| Dash Punctuation | 12 | 11.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 36 | |
| 2 | 24 | |
| 1 | 24 | |
| 9 | 12 | 12.5% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 12 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 108 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 36 | |
| 2 | 24 | |
| 1 | 24 | |
| - | 12 | 11.1% |
| 9 | 12 | 11.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 108 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 36 | |
| 2 | 24 | |
| 1 | 24 | |
| - | 12 | 11.1% |
| 9 | 12 | 11.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| Lake Victoria | Simiyu | Month | Period | |
|---|---|---|---|---|
| 0 | 3.176000 | 2.908474 | Jan | 2001-2019 |
| 1 | 3.477000 | 1.800000 | Feb | 2001-2019 |
| 2 | 4.687053 | 2.981053 | Mar | 2001-2019 |
| 3 | 7.004526 | 4.753579 | Apr | 2001-2019 |
| 4 | 9.362789 | 4.077474 | May | 2001-2019 |
| 5 | 3.430211 | 1.046947 | Jun | 2001-2019 |
| 6 | 1.764421 | 0.195211 | Jul | 2001-2019 |
| 7 | 2.812632 | 0.333632 | Aug | 2001-2019 |
| 8 | 3.978895 | 1.205842 | Sep | 2001-2019 |
| 9 | 5.318421 | 2.454737 | Oct | 2001-2019 |
Last rows
| Lake Victoria | Simiyu | Month | Period | |
|---|---|---|---|---|
| 2 | 4.687053 | 2.981053 | Mar | 2001-2019 |
| 3 | 7.004526 | 4.753579 | Apr | 2001-2019 |
| 4 | 9.362789 | 4.077474 | May | 2001-2019 |
| 5 | 3.430211 | 1.046947 | Jun | 2001-2019 |
| 6 | 1.764421 | 0.195211 | Jul | 2001-2019 |
| 7 | 2.812632 | 0.333632 | Aug | 2001-2019 |
| 8 | 3.978895 | 1.205842 | Sep | 2001-2019 |
| 9 | 5.318421 | 2.454737 | Oct | 2001-2019 |
| 10 | 5.118474 | 3.091421 | Nov | 2001-2019 |
| 11 | 4.168105 | 3.890053 | Dec | 2001-2019 |